The boosting: A new idea of building models
نویسندگان
چکیده
a r t i c l e i n f o The idea of boosting deeply roots in our daily life practice, which constructs the general aspects of how to think about chemical problems and how to build chemical models. In mathematics, boosting is an iterative reweighting procedure by sequentially applying a base learner to reweighted versions of the training data whose current weights are modified based on how accurately the previous learners predict these samples. By using different loss criteria, boosting copes with not only classification problems but also regression problems. In this paper, the basic idea and algorithms of commonly used boosting are discussed in detail. The applications to two datasets are conducted to illustrate the significant performance of boosting. Boosting, based on an ensemble of individual models, is one of the most powerful learning methods introduced in the last ten years. It stems from the PAC (probably approximately correct) learning framework [1] or the concept of ensemble learning [2,3] and was originally designed for classification problems. Based on the PAC theory, Kearns and Valiant [4] were the first to pose the question of whether a " weak " learning algorithm that performs just slightly better than random guessing can be " boosted " into an arbitrarily accurate " strong " learning algorithm. Such a question forms the foundation of boosting. The underlying idea of boosting is a procedure that combines the outputs of many " weak " learners to produce a powerful " committee'. A weak learner is an algorithm whose error rate is only slightly better than random guessing. Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions [5–7]. Nevertheless, it should be noted that the ensemble obtained through boosting can reduce the variance and bias of a model simultaneously. The first provable polynomial-time boosting algorithm which suffered from certain practical drawbacks was developed in 1990 by Schapire [8] in the PAC learning framework. The AdaBoost algorithm, introduced in 1995 by Freund and Schapire [9], solved many of the practical difficulties of the earlier boosting algorithms. AdaBoost, as the most popular boosting procedure, has since drawn much attention [10–16]. In order to find out why AdaBoost, in most cases, performs well, Friedman et al. [17,18] analyzed AdaBoost statistically, derived the exponential criterion, and showed that it estimated the log-odds of …
منابع مشابه
A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System
In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...
متن کاملOutlier Detection by Boosting Regression Trees
A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...
متن کاملUsing the Reaction Delay as the Driver Effects in the Development of Car-Following Models
Car-following models, as the most popular microscopic traffic flow modeling, is increasingly being used by transportation experts to evaluate new Intelligent Transportation System (ITS) applications. A number of factors including individual differences of age, gender, and risk-taking behavior, have been found to influence car-following behavior. This paper presents a novel idea to calculate ...
متن کاملImproving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran
An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...
متن کاملA New RSTB Invariant Image Template Matching Based on Log-Spectrum and Modified ICA
Template matching is a widely used technique in many of image processing and machine vision applications. In this paper we propose a new as well as a fast and reliable template matching algorithm which is invariant to Rotation, Scale, Translation and Brightness (RSTB) changes. For this purpose, we adopt the idea of ring projection transform (RPT) of image. In the proposed algorithm, two novel s...
متن کامل